Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental GPU support for Windows #103

Merged
merged 1 commit into from
Feb 1, 2020

Conversation

mworchel
Copy link
Contributor

This PR enables CUDA compilation for Windows and therefore introduces experimental GPU support. I re-organized parts of the CMakeLists.txt to have an (arguably) more ordered structure. Pybind support is now added by linking against an exported target instead of using the call to pybind11_add_module (which would interfere with usage of cuda_add_library).

Pathtracing does still not work. However, albedo and deferred rendering work fine. So does backpropagation, at least for the GPU mode (see #93).

I didn't test the changes on linux or ios, so that's up for testing.

# The "-undefined dynamic_lookup" is a hack for systems with
# multiple Python installed. If we link a particular Python version
# here, and we import it with a different Python version later.
# likely a segmentation fault.
Copy link

@bjorn3 bjorn3 Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence seems to be missing the beginning. (pre-existing)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a merge error or I was just typing non-sense. Will fix it at some point.

@BachiLi
Copy link
Owner

BachiLi commented Feb 1, 2020

@mworchel This works fine on my side and I'll merge it. I'm curious about the path tracing issue. What is wrong about it?

@BachiLi BachiLi merged commit 59c1bb5 into BachiLi:master Feb 1, 2020
@mworchel
Copy link
Contributor Author

mworchel commented Feb 2, 2020

Thanks for merging.

Regarding the path tracing, I noticed that whenever I try to do it in GPU mode under Windows, the application will crash (in CPU mode it works fine, however the backprop does not work there). I invested some time and the crash is caused when calling update_active_pixels(https://github.com/BachiLi/redner/blob/master/pathtracer.cpp#L384). The call to thrust::copy_if in this function fails at device synchronization. The issue is potentially caused by an earlier call to the CUDA API but the error only appears at this point because of the sync.

I ran a cuda-memcheck but I think I didn't see any specific errors. I could try to place device sync points after the preceding functions like accumulate_path_contribs or intersect to narrow it further down.

@mworchel mworchel deleted the windows-gpu-support branch February 2, 2020 12:24
@BachiLi
Copy link
Owner

BachiLi commented Feb 2, 2020

Interesting. The most suspicious candidate for this is the sample_point_on_light procedure, which performs a binary search over an array, which is allocated on another array, where the array is allocated on CUDA unified memory (no typo here). I suspect that the unified memory works differently under windows and is causing this issue. This might also be the reason why some of the older GPUs caused crash in redner. Not sure about the solution yet if this is the cause.

@mworchel
Copy link
Contributor Author

mworchel commented Feb 3, 2020

I added multiple sync points and interestingly, the call to accumulate_path_contribs is the one that fails, causing an unspecified launch failure. I commented out nearly everything in the path_contribs_accumulator only being left with something like

if (rendered_image != nullptr) {
    auto nd = channel_info.num_total_dimensions;
    auto d = channel_info.radiance_dimension;
    rendered_image[nd * pixel_id + d]     += 0.5 /*weight * path_contrib[0]*/;
    rendered_image[nd * pixel_id + d + 1] += 0.5 /*weight * path_contrib[1]*/;
    rendered_image[nd * pixel_id + d + 2] += 0.5 /*weight * path_contrib[2]*/;
}
if (edge_contribs != nullptr) {
    edge_contribs[pixel_id] += sum(weight * 0.5/* path_contrib*/);
}

and with this, the forward pass succeeds. I'll try to comment in stuff one after another and pin the error down some more. Maybe you already have some kind of suspicion? "Luckily" it even fails with a 1x1 image, so I can watch one pixel in isolation.

EDIT: Commenting in this line (https://github.com/BachiLi/redner/blob/master/path_contribution.cpp#L38)

auto bsdf_val = bsdf(material, shading_point, wi, wo, min_rough);

triggers the error. Maybe some out of bounds access on the textures?

EDIT2: Hmm ok, when I comment out lines https://github.com/BachiLi/redner/blob/master/material.h#L404-L441 (specular component), the forward pass succeeds and I get the correct diffuse colors.

EDIT3: When I comment out these lines https://github.com/BachiLi/redner/blob/master/material.h#L435-L440 the whole thing doesn't compile anymore with some template error in a thrust header. Ok, what?

@mworchel
Copy link
Contributor Author

mworchel commented Feb 3, 2020

Ok, so I was able to fix the error (currently only trying the forward pass) by factoring the Fresnel term into two distinct terms, so replacing

auto F = specular_reflectance +
    (1.f - specular_reflectance) *
    pow(max(1.f - cos_theta_d, Real(0)), 5.f);

by

Vector3 F1 = specular_reflectance;
Vector3 F2 = (1.f - specular_reflectance) * pow(max(1.0 - cos_theta_d, Real(0)), 5.0);
auto F = F1 + F2;

I'm not sure what the root cause is, but I'd guess it has something to do with accessing references to temporary objects in the overloaded operators. Although the code looks fine to me.

EDIT: The fix is extremely brittle, though. If you move the part pow(max(1.0 - cos_theta_d, Real(0)), 5.0) into its own variable, it won't compile anymore (thrust template errors as above) and if you replace 1.0 and 5.0 with 1.f and 5.f you get the CUDA runtime errors.

@BachiLi
Copy link
Owner

BachiLi commented Feb 3, 2020

I am not using any arcane expression template magic here, so I don't think it's the reference problem you mentioned. This is weird to a point that I start to suspect it's a compiler bug in nvcc.

@mworchel
Copy link
Contributor Author

mworchel commented Feb 3, 2020

It definitely looks like something is very wrong. Especially the compiler error when commenting out certain lines is weird. I wouldn't be surprised if might be on nvcc's side, yes. The only stuff in the code that looks remotely suspicious as I'm not sure it's guaranteed to work by the cpp standard is the operator[] of TVectorN and TFrame (I mean the *(&x + i); stuff). It assumes the members are laid out linearly without padding but I think a compiler is allowed to do weird stuff like aligning the members to certain byte boundaries where this code might break?! At this point I'm just desperately trying to come up with some logical explanation..

@mworchel
Copy link
Contributor Author

mworchel commented Feb 4, 2020

Ok, the compiler errors might be caused by some incremental build stuff going wrong or some other voodoo. It's not as consistently reproducible as I thought.

However, I was able to fix the forward pathtracing pass with even less changes. Just replacing

// Schlick's approximation
auto F = [...] * pow(max(1.f - cos_theta_d, Real(0)), 5.f);

by

// Schlick's approximation
auto F = [...] * pow(max(1.f - cos_theta_d, Real(0)), 5.0);

so, replacing 5.f by 5.0. I really don't know which pow function is used in the above case, as it's called with arguments (double, float) but that seems to cause the issue. The latter case calls pow(double, double) and everything is fine.

I cannot believe this is the fix but making sure the (double, double) version is called everywhere, even allows to use path tracing with backpropagation now!

I'll test it on one more machine and will send a PR later.

@mworchel
Copy link
Contributor Author

This is weird to a point that I start to suspect it's a compiler bug in nvcc.

@BachiLi, I was just reminded of this PR and the discussion when I tried to compile Redner today and faced the same weird compilation error again. After all, you were right! It's a bug in NVCC which should be fixed in CUDA 11:
NVIDIA/thrust#1090

Leaving this here for documentation purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants